utilizing generalized learning automata for finding optimal policies in mmdps

نویسندگان

samaneh assar

faculty of computer and information technology engineering, qazvin branch, islamic azad university, qazvin, iran behrooz masoumi

faculty of computer and information technology engineering, qazvin branch, islamic azad university, qazvin, iran

چکیده

multi agent markov decision processes (mmdps), as the generalization of markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for multi agent reinforcement learning. in this paper, a generalized learning automata based algorithm for finding optimal policies in mmdp is proposed. in the proposed algorithm, mmdp problem is described as a directed graph in which the nodes are the states of the problem, and the directed edges represent the actions that result in transition from one state to another. each state of the environment is equipped with a generalized learning automaton whose actions are moving to different adjacent states of that state. each agent moves from one state to another and tries to reach the goal state. in each state, the agent chooses its next transition with help of the generalized learning automaton in that state. the experimental results have shown that the proposed algorithm have better learning performance in terms of the speed of reaching the optimal policy as compared to existing learning algorithms.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Utilizing Generalized Learning Automata for Finding Optimal Policies in MMDPs

Multi agent Markov decision processes (MMDPs), as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi agent system and are used as a suitable framework for Multi agent Reinforcement Learning. In this paper, a generalized learning automata based algorithm for finding optimal policies in MMDP is proposed. In the proposed algorithm, MMDP ...

متن کامل

Learning Automata based Algorithms for Finding Optimal Policies in Fully Cooperative Markov Games

Markov games, as the generalization of Markov decision processes to the multi agent case, have long been used for modeling multi-agent systems. In this paper, several learning automata based multi-agent system algorithms for finding optimal policies in fully-cooperative Markov Games are proposed. In the proposed algorithms, Markov problem is described as a directed graph in which the nodes are ...

متن کامل

Learning Automata Based Multi-agent System Algorithms for Finding Optimal Policies in Markov Games

Markov games, as the generalization of Markov decision processes to the multi-agent case, have long been used for modeling multi-agent systems (MAS). The Markov game view of MAS is considered as a sequence of games having to be played by multiple players while each game belongs to a different state of the environment. In this paper, several learning automata based multiagent system algorithms f...

متن کامل

Finding Optimal Refueling Policies in Transportation Networks

We study the combinatorial properties of optimal refueling policies, which specify the transportation paths and the refueling operations along the paths to minimize the total transportation costs between vertices. The insight into the structure of optimal refueling policies leads to an elegant reduction of the problem of finding optimal refueling policies into the classical shortest path proble...

متن کامل

A linear-time algorithm for finding optimal vehicle refueling policies

We explore a fixed-route vehicle refueling problem as a special case of the inventorycapacitated lot-sizing problem, and present a linear-time greedy algorithm for finding optimal refueling policies.

متن کامل

On Finding Optimal Policies for Markovian Decision Processes Using Simulation

A simulation method is developed, to find an optimal policy for the expected average reward of a Markovian Decision Process. It is shown that the method is consistent, in the sense that it produces solutions arbitrarily close to the optimal. Various types of estimation errors are examined, and bounds are developed.

متن کامل

منابع من

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید


عنوان ژورنال:
journal of computer and robotics

جلد ۶، شماره ۲، صفحات ۱۵-۲۲

میزبانی شده توسط پلتفرم ابری doprax.com

copyright © 2015-2023